Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: explicitly specify column groups for storage v2 api #39790

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

shaoting-huang
Copy link
Contributor

@shaoting-huang shaoting-huang commented Feb 11, 2025

  • use the new packed reader and writer api to be compatible with current etcd meta
  • For the new packed writer API: column groups and paths are explicitly defined by users and won't split column groups by memory in storage v2. Packed writer follows the user-defined column groups to split arrow record and write into the corresponding file path.
  • For the new packed reader API: read paths are explicitly defined by users.
    related: [Feature]: storage v2 support #39173

@sre-ci-robot sre-ci-robot added size/L Denotes a PR that changes 100-499 lines. area/compilation labels Feb 11, 2025
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Feb 11, 2025
Copy link
Contributor

mergify bot commented Feb 11, 2025

@shaoting-huang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 11, 2025

@shaoting-huang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 82.03883% with 37 lines in your changes missing coverage. Please review.

Project coverage is 80.59%. Comparing base (0d87371) to head (d8aec30).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
internal/storagev2/packed/packed_writer.go 70.00% 5 Missing and 4 partials ⚠️
internal/core/src/segcore/packed_reader_c.cpp 50.00% 7 Missing ⚠️
internal/core/src/segcore/packed_writer_c.cpp 64.70% 6 Missing ⚠️
internal/storagev2/packed/packed_reader.go 62.50% 3 Missing and 3 partials ⚠️
internal/core/src/segcore/arrow_fs_c.cpp 90.47% 4 Missing ⚠️
internal/storagev2/packed/util.go 50.00% 4 Missing ⚠️
internal/core/src/segcore/column_groups_c.cpp 93.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #39790       +/-   ##
===========================================
+ Coverage   69.61%   80.59%   +10.97%     
===========================================
  Files         307     1468     +1161     
  Lines       27491   205533   +178042     
===========================================
+ Hits        19139   165642   +146503     
- Misses       8352    33872    +25520     
- Partials        0     6019     +6019     
Components Coverage Δ
Client 79.25% <ø> (∅)
Core 69.90% <79.54%> (+0.28%) ⬆️
Go 82.39% <83.89%> (∅)
Files with missing lines Coverage Δ
internal/storage/serde_events.go 75.03% <ø> (ø)
internal/storage/serde_events_v2.go 61.71% <100.00%> (ø)
internal/util/initcore/init_core.go 90.32% <100.00%> (ø)
internal/core/src/segcore/column_groups_c.cpp 93.33% <93.33%> (ø)
internal/core/src/segcore/arrow_fs_c.cpp 90.47% <90.47%> (ø)
internal/storagev2/packed/util.go 50.00% <50.00%> (ø)
internal/core/src/segcore/packed_writer_c.cpp 73.17% <64.70%> (+73.17%) ⬆️
internal/storagev2/packed/packed_reader.go 72.72% <62.50%> (ø)
internal/core/src/segcore/packed_reader_c.cpp 44.18% <50.00%> (+44.18%) ⬆️
internal/storagev2/packed/packed_writer.go 78.18% <70.00%> (ø)

... and 1154 files with indirect coverage changes

@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels Feb 13, 2025
Copy link
Contributor

mergify bot commented Feb 13, 2025

@shaoting-huang go-sdk check failed, comment rerun go-sdk can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Feb 14, 2025

@shaoting-huang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 15, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Feb 16, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@shaoting-huang shaoting-huang changed the title feat: storage v2 packed api integration feat: explicitly specify column groups for storage v2 api Feb 17, 2025
Copy link
Contributor

mergify bot commented Feb 17, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 18, 2025

@shaoting-huang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 18, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 18, 2025

@shaoting-huang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 20, 2025

@shaoting-huang go-sdk check failed, comment rerun go-sdk can trigger the job again.

@shaoting-huang shaoting-huang force-pushed the storagev2_api branch 2 times, most recently from 1ee5374 to 9766510 Compare February 20, 2025 12:06
Copy link
Contributor

mergify bot commented Feb 20, 2025

@shaoting-huang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 20, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@shaoting-huang
Copy link
Contributor Author

/run-cpu-e2e

@shaoting-huang
Copy link
Contributor Author

rerun cpp-unit-test

@@ -95,6 +95,7 @@ typedef struct CStorageConfig {
bool useVirtualHost;
int64_t requestTimeoutMs;
const char* gcp_credential_json;
bool use_custom_part_upload;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that use_custom_part_upload can be false? We can remove it from StorageConfig if it is actually not configurable.

Comment on lines +38 to +40
CleanLocalArrowFileSystemSingleton() {
milvus_storage::ArrowFileSystemSingleton::GetInstance().Release();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to have two separate CleanXXX functions

field2Col: field2Col,
}, nil
}

func NewPackedDeserializeReader(path string, schema *schemapb.CollectionSchema,
bufferSize int, pkFieldID FieldID,
func NewPackedDeserializeReader(paths []string, schema *schemapb.CollectionSchema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a 2D array paths to support chunking.

if status != 0 {
return errors.New("PackedWriter: failed to write record batch")
if err := ConsumeCStatusIntoError(&status); err != nil {
return err
}

return nil
}

func (pw *PackedWriter) Close() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, we need to collect necessary stats and flush them on Close. We can add such support in future PRs.

Copy link
Contributor

mergify bot commented Feb 21, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 21, 2025

@shaoting-huang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Feb 21, 2025

@shaoting-huang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 21, 2025

@shaoting-huang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@tedxu
Copy link
Contributor

tedxu commented Feb 21, 2025

/lgtm
/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shaoting-huang, tedxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot added the ci-passed label Feb 21, 2025
@sre-ci-robot sre-ci-robot merged commit 3eb3af5 into milvus-io:master Feb 21, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/compilation ci-passed dco-passed DCO check passed. kind/feature Issues related to feature request from users lgtm size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants